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INTRODUCTION 


It is often difficult and time consuming, if not computationally impossible, to 
locate a failed component in a large complex system. Recently, the U.S. Army Research 
and Technology Laboratories at Moffett Field, California, have established a theory 
stating that the minimum number of test points required for conclusive detection of 
system failure is equal to the total number of terminal test points; this set of 
points constitutes the optimal choice. In this report we have developed an optimal 
diagnostic strategy for finding a failed component in a malfunctioning system. 

Four problems intrinsically related to this strategy are given as an introduc- 
tion. The first problem (ref. 2) deals with finding a selected object from a given 
set, each object of which has a known probability of being chosen. The problem is 
solved using a series of yes-no questions, arbitrarily dividing the set of objects 
into two groups and asking in which group the selected object is contained. The 
yes-no questioning process continues until the selected object is known. The second 
problem is the gold-coin-in-the-box problem (ref. 2). Briefly, a gold coin is con- 
cealed in one of m boxes of copper coins, where each box i, i = 1, 2, 3, . . .,m. 
Assuming a known probability of finding the gold coin and the associated cost 
involved, one must choose a strategy to maximize the probability of finding the gold 
coin when a budget ceiling is imposed, incorporating a repeated search through the 
set of boxes until the coin is found. The third problem (ref. 3) does not allow for 
the repeated search of the gold-coin-in-the-box problem. This problem addresses how 
to find the state of every component in a k/n system (an n-component system having 
the property that the system is functioning if at least k out of n components are 
functioning) in the minimum optimal time, given that the system is not functioning. 
Finally, the last problem (ref. 4) is stated as follows: Given an n-component 

functional system in which numerous components are functionally interdependent, 
determine a diagnostic strategy for conclusive detection of the operational status of 
the system with the least expenditure. 

The present paper concerns itself with the problem of optimally finding a failed 
component in a malfunctioning system. A concise statement of the problem, together 
with the main result, will be given in the third section, entitled Admissible Diag- 
nostic Strategies, after some relevant definitions and assumptions have been intro- 
duced in the second section. The fourth section contains illustrative examples for 
determining the optimal strategies based on the theorem obtained in the third 
section. 


DEFINITIONS AND ASSUMPTIONS 


In this section we shall state explicitly the definitions and assumptions upon 
which our strategy is based. These definitions and assumptions are generally accepted 
in the mathematical sciences and engineering communities; see, for example, refer- 
ences 5-7. 


Definitions 

Failure: A condition characterized by the inability of a material, structure, or 

system to fulfill its intended purpose or task. 


Malfunction: Failure to operate in the normal or expected manner, or level of 

performance. 

Test: An observation or measurement procedure that provides sufficient informa- 

tion to determine whether or not all members of a particular subset of elements are 
functioning properly. 

Coherent systems: Those systems for which the replacement of a failed component 

by a functioning one will not induce a functioning system to fail. (A rigorous 
mathematical definition is given in appendix A.) 

Strategy: An ordering of the components which are tested sequentially in the 

predetermined order until a failed component can be found. 

Admissible strategy: A strategy whose expected expenditure is minimum. 

Optimal strategy: An admissible strategy which obeys Bellman’s Principle of 

Optimality. 

Bellman’s Principle of Optimality: Whatever the initial state and initial deci- 

sion are, the remaining decisions must constitute an optimal policy with regard to the 
state resulting from the first decision. 


Assumptions 


It is hypothesized that 

1. At any instant or stage, the system or equipment under consideration may be 

in only one of two states: functioning or faulty. 

2. The system can be schematically decomposed into a finite number of components 
(or modules), each of which, at any instant, is in one of the two possible states. 

3. The state of the system depends solely on the states of its components. 

Hypothesis (1) is a realistic assumption because if the performance level of a given 
component is degraded to an "unsatisfactory” level (or beyond the tolerance as cited 
in the specifications of the equipment), then the component is in the malfunctioning 
state. Hypothesis (2) demands only the feasibility of schematic system decomposition, 
not necessarily a physical decomposition. Hypothesis (3) tacitly assumes that a 
proper environment exists for the system under question. 


ADMISSIBLE DIAGNOSTIC STRATEGIES 


Before we discuss the main result of this paper, it is necessary to restate con- 
cisely the problem under consideration: Suppose that we are given an n-component, 

coherent, malfunctioning system for which the component reliability and the associated 
test time are known. A malfunctioning component must be found such that the expected 
test time is optimal in the sense of Bellman’s Principle of Optimality. 

The following lemma plays an important role in the subsequent deductive process. 
(The mathematical proof is provided in appendix B.) 
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Lemma. Given that an n-component, coherent system is not functioning, and 
p-L and t-^, i = 1, 2, 3, . . .,n, are the reliability and test time of component i, 
respectively, then the expected time T(a) of a strategy a for finding a malfunc- 
tioning component of the system is given by 


T (a) 


* 



where 


P(0) E 1 


and the conditional probability 


P(k) 



when the kth component of a is functioning, given that the system is down, and 
all the first k - 1 components are functioning. 


With this 
theorem (thus, 


lemma at our disposal, it is a trivial exercise to deduce the following 
the proof is omitted). 


Theorem. Given that an n-component, coherent system is down, the corresponding 
reliability and test time are known, and it is necessary to find a malfunctioning 
component of the system, then the following statements are equivalent: 


1. Strategy s is admissible 

2. T(s) < T ( a ) , V a. E 


where is the set of all ordering of n components. 


The main drawbacks of the theorem include the amount of algebraic computation 
and enumeration, and the bookkeeping of (e. g. , for n = 15, contains more 
than 1.3 X 10 12 strategies). Therefore, even for a "moderate” value of n, the compu- 
tation would defy the capability of a present-day electronic computer. Since the 
number of strategies grows factorially as a function of n, an obvious remedy to over- 
come this difficulty is to partition the system into a sequence of nested levels (or 
subsystems) in which the number of components involved in each level is computationally 
manageable. Unfortunately, the resulting strategy is not necessarily admissible or 
optimal in the global sense, even though it is admissible and optimal locally. A 
better way to solve this combinatorial problem is to reduce the computational complex- 
ity of statement 2 of the above theorem by finding its equivalence. This would sim- 
plify the computation immensely while preserving the global admissibility and optimal- 
ity. (This result will be published in a subsequent paper.) 
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EXAMPLES FOR FINDING OPTIMAL STRATEGIES 


The main result of the preceding section is to provide a means to determine 
optimal diagnostic strategies for finding a malfunctioning component in a failed 
coherent system. 


We begin with an intuitively simple example. Suppose that a four-component 
coherent system is not functioning. Let a, 8, y> and 6 be the components whose 
reliabilities p and test times t are as follows: 


Component p _t 


a 0.9 2 

8 0.9 4 

Y 0.9 2 

6 0.9 3 


Then for this example the set contains 24 strategies: 


A 

= 

<a. 

8, 

Y> 

6> 

M 

= 

<3, 

Y > 

a. 

6> 

B 

= 

<6, 

a. 

3* 

Y> 

N 

= 

<3, 

6, 

a. 

Y> 

C 

= 

<Y> 

6, 

a. 

8> 

0 

= 

<Y> 

a. 

8, 

6> 

D 

= 

<8, 

Y » 

6, 

a> 

P 

= 

<Y> 

a. 

6, 

6> 

E 

= 

<6, 

6, 

Y> 

a> 

Q 

= 

<6, 

a, 

Y ♦ 

8> 

F 

= 

<6, 

Y > 

3, 

a> 

R 

= 

<3, 

a, 

6, 

Y> 

G 

= 

<Y > 

6, 

3, 

a> 

S 

= 

<3, 

a. 

Y > 

6> 

H 

= 

<6, 

6, 

Y> 

a> 

U 

= 

<a. 

3, 

6, 

Y> 

I 

= 

<Y > 

8, 

6, 

a> 

V 

= 

<a, 

Y> 

6, 

8> 

J 

= 

<6, 

Y « 

a. 

6> 

w 

= 

<a , 

6, 

Y » 

6> 

K 

= 

<6, 

6, 

a. 

Y> 

X 

= 

<a. 

Y» 

8, 

6> 

L 

= 

<Y » 

8, 

a. 

6> 

Y 

= 

<a. 

6, 

8, 

Y> 


Also, the conditional probabilities are: (In what follows all numbers have been 

rounded off to four decimal places.) 


P(a 

S) = 

0.7092 

P ( 8 

sa y ) 

= 

0.6310 

P(8 

S) = 

.7092 

P(6 

SA6) 

= 

.6310 

p (y 

S) = 

.7092 

P( Y 

SAa) 

= 

.6310 

P(6 

S) = 

.7092 

P(y 

SA8) 

= 

.6310 

P(a 

SA8) 

= 0.6310 

P(y 

SA6) 

= 

.6310 

P (a 

SAy) 

= .6310 

P(6 

SAa) 

= 

.6310 

P(a 

SA6) 

= .6310 

P(6 

SAB) 

= 

.6310 

P(8 

SAa) 

= .6310 

P(6 

sa y ) 

= 

.6310 


and the expected times of the strategies are: 


T (A) = 5.7318 
T(B) = 6.2084 
T(C) = 5.0226 
T(D) = 6.7609 
T (E) = 7.0226 
T (F) = 6.2084 
T (G) = 5.9176 
T (H) = 6.7318 


T(I) 

= 6.1793 

T(J) 

= 5.3134 

T(K) 

= 6.7318 

T(L) 

= 5.7318 

T (M) 

= 6.3134 

T(N) 

= 7.0226 

T(0) 

= 5.2084 

T(P) 

= 4.7609 
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T(Q) = 5. 3134 
T (R) = 6.7609 
T(S) = 6.3134 
T (U) = 6.1793 


T (V) = 4.7609 
T (W) = 5.0226 
T (X) = 5.2084 
T(Y) = 5.9176 


The minimum value is attained by strategies P and V. By definition, both P and V 
are admissible. To determine optimality, we compute for each admissible strategy the 
expected time at each of the three subsequences of tests. 


Admissible 

strategy 


Expected time of 
first test 


Expected time of 
first two tests 


Expected time of 
first three tests 


P 2.0000 

V 2.0000 


3.4184 

3.4184 


4.7609 

4.7609 


In this case, the corresponding expected times at each stage are equal. Thus, P and 
V are also optimal. This is not surprising for two reasons: (1) the component test 

times are the deciding parameters for the system whose entropy is maximum, and (2) any 
admissible strategy with nondecreasing, termwise-smallest sequence of component test 
times is optimal. 


For the next example, let us reconsider the last problem using the following 
reliability data: 


Component P 


a 0.9 

8 0.8 

Y 0.6 

6 0.8 


For this example the conditional probabilities are: 


P (a 

S) = 0.8472 

P ( 3 

SAy) 

= 

0.5284 

P(S 

S) = .6944 

P(B 

SA<5) 

= 

.6479 

P(y 

S) = .3887 

p (y 

SAa) 

= 

.3506 

P(S 

S) = .6944 

p(y 

SAB) 

= 

.2958 

P(a 

SAB) = 0.8239 

P(y 

SA6) 

= 

.2958 

P(a 

SAy) = .7643 

P(6 

SAa) 

= 

.6753 

P (a 

SA6) = .8239 

P(6 

SAB) 

= 

.6479 

P(8 

SAa) = .6753 

P(6 

sa y ) 

= 

.5284 


and the expected times of the strategies are: 


T(A) 

= 

6.5330 

T(L) 

= 

3.9656 

T(B) 

= 

6.6773 

T (M) 

= 

5.7996 

T(C) 

= 

3.5769 

T(N) 

= 

6.9830 

T(D) 

= 

6.0050 

T (0) 

= 

3.9657 

T(E) 

= 

6.9830 

T(P) 

= 

3.6687 

T(F) 

= 

5.2104 

T(Q) 

= 

5.5330 

T(G) 

= 

3.9877 

T(R) 

= 

7.1051 

T (H) 

= 

6.6774 

T(S) 

= 

6.5330 

T (I) 

= 

4.1710 

T(U) 

= 

7.1051 

T(J) 

= 

4.7996 

T (V) 

= 

4.5855 

T (K) 

= 

6.6774 

T(W) 

= 

5.6858 
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T (X) = 4.8825 


T(Y) = 6.8301 


Obviously, strategy C is the only admissible strategy, and hence it is optimal also. 

It is interesting to note- that the optimal strategy, C = <y, 6, a, g> proceeds 
with the most unreliable and the least test time component y as the first component 
to be tested; next in the test sequence is 6, which is the next most unreliable and 
costly component. Between the remaining components a and g, C yields a for the 
third position; yet 6 is less reliable than a. This apparent contradiction to 
intuition can be explained by reexamining the preceding lemma, which asserts that the 
expected test time of the last component is zero . That is to say, equivalently, 
between the last two components, an optimal strategy always chooses the one having a 
smaller test time regardless of their reliability data. 


CONCLUSIONS 


An optimal diagnostic strategy for finding a failed component in an n-component, 
coherent, malfunctioning system is presented. It was found that even for a moderate 
value of n the amount of algebraic computation and enumeration of the set of all 
possible strategies becomes computationally infeasible because of its factorial- 
growth character. An obvious solution to this problem is to partition the system 
into a sequence of nested subsystems, in which the number of components involved in 
each subsystem is computationally manageable. Unfortunately, this strategy is not 
necessarily optimal or even admissible in the global sense. 
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APPENDIX A 


A MATHEMATICAL DEFINITION OF COHERENT SYSTEMS 


Let S be the set of all n components of an n-component system. Then a state 
function X on S onto the set {0,1} is defined as follows: 

! 1, if component c is functioning 

0, if component c is faulty 

A structure function <|> on the set of all n-tuple, X(S), onto {0,1} is defined 
by 

! 1, if the system is functioning 
0, if the system is faulty 

A structure function (p is monotone if it has the following property: 

V a,b G X(S), a < b => <f>(a) £ <f>(b) 

A coherent system is a system whose structure function is monotone. 
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APPENDIX B 


A MATHEMATICAL PROOF FOR 


T<s n> 


n-i 


= E 

i=i 




t . 
1 


Lemma. Given that an n-component, coherent system is not functioning, and 
Pi and tj[, i = 1, 2, 3, . . .,n are the component reliability and test time of i, 
respectively, then the expected time T(s n ) of a strategy s R for finding a mal- 
functioning component of the system is given by 

n-i T i-i 

(*) T(s n ) = £ TT p 

i=i I k=o 


= 1 


5 p(k|S A m) 
\ m=i / 


where 


P(0) 


and the conditional probability 


P(k) 


(k)lti 


when the kth component of s n is functioning, given that the system is not func- 
tioning, and all the first k-1 components are functioning. 

Proof: We prove the expression (*) by mathematical induction on n. To show 

the basis of induction, let n = 1 (i.e., a one-component system). In this case, 
T(s 1 ) = 0, since no test is needed. Also, for n = 1, the right-hand side of (*) is 
zero because a null sum is defined to be zero. Next, by inductive hypothesis, (*) is 
valid for an n-component system. It remains to be shown that the inductive step is 
valid whenever the inductive hypothesis is valid; that is, we must prove that for an 
(n + 1) -component system 


(n+i )-i T i-i 

t( Vm> - E TT rook 

1=1 l_k=0 J 

Note that 


(n+i)-i 1 

r i_i i 

n-i 1 

"i-i I 

E ! 

TT p oo L 

= E 

TT P(k) t. + P(1)P(2) . . . P(n 

i=i | 

L k=o J 

i=i| 


= T(s n ) + P(1)P(2) . . . P(n 

= T(s ) + P(1)P(2) . . . P(n 
n 

+ [1 - P(n) ]Tj(s n+i ) } 


1)fc n 

l){t n + P(n)x 0 (s n+1 ) 
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The next to the last equality holds by the inductive hypothesis, and, given that the 
system is not functioning and the first n components are functioning, the last 
equality follows from the fact that the expected remaining test time (denoted by 
T o( s n +i))’ is zero. Likewise the expected remaining test time, T 1 (s n+1 ) is zero 
if the first n - 1 components are functioning and the nth component is not func- 
tioning. The quantity in the braces in the last term is precisely the expected time 
for testing the nth component in Sn+i • Therefore, together with the first term, 
T(s n ), it constitutes the expected test time of s-n+i ; that is, 

T(s n ) + P ( 1 )P(2) . . . P(n - l){t n + P(n)T 0 (s n+i ) + [1 - P(n) ] t x (s n+1 ) } = T(s n+1 > 
Hence, by the principle of mathematical induction, the assertion follows. 
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